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Introduction 


We  are  interested  in  finding  an  adequate  model  for  a 
scalar  observation  set  f  y(s) ,s  =  (s^»s2^  ^  obtained  from 

a  homogeneous  Gaussian  random  field,  where  is  a  square 
grid  of  side  ,  so  that  1  s^  s-  ,  i  =  1,2.  The  random 
field  is  not  necessarily  isotropic.  Such  problems  are  of 
interest  in  various  applications  such  as  plant  modeling  [1], 
image  restoration  [2-3],  seismology  [4],  and  image  modeling 
[5-6] . 


In  situations  such  as  these,  any  observation  at  (i,j)  is 
statistically  dependent  on  the  observations  over  a  neighbor¬ 
hood  of  (i,j),  in  contrast  with  the  familiar  univariate  time 
series  models,  where  the  observation  at  any  instant  is  depen¬ 
dent  only  on  the  past  observations.  Neighborhood  models  seem 
to  be  more  appropriate  for  images  since  for  an  image  there  is 
no  essential  difference  between  the  neighbors  on  one  side  and 
those  on  the  other.  Our  prime  interest  in  this  paper  is  in 
developing  decision  rules  for  choosing  an  appropriate  neigh¬ 
borhood  from  among  a  given  set  of  neighborhoods.  For  instance, 
the  set  of  north,  south,  east  and  west  neighbors  constitutes 
a  particular  choice  of  neighborhood. 

The  main  approaches  to  choosing  neighbors  in  model  build¬ 


ing  are  the  following: 


1)  Pairwise  testing 

2)  Akaike's  information  criterion  (AIC) 

3)  The  Bayesian  approach 

The  reader  is  referred  to  the  statistical  literature  [7] 
for  a  discussion  of  pairwise  testing.  The  main  criticisms 
of  this  approach  are  that  the  resulting  decision  rules  are 
not  consistent,  i.e.,  the  probability  of  choosing  an  incor¬ 
rect  model  does  not  go  to  zero  even  as  the  number  of  obser¬ 
vations  goes  to  infinity.  Also,  the  decision  rules  are  not 
transitive,  i.e.,  given  three  hypotheses  C-,  ,  C 2,  and  C^,  if 
is  preferred  to  C2  and  C2  is  preferred  to  C^,  then  it 
does  not  follow  that  is  preferred  to  [8] . 

The  AIC  method  [9]  considers  the  so-called  AIC  statistics 
of^ the  given  observations  for  each  model.  The  best  model  is 
the  one  which  minimizes  the  AIC  statistic.  This  method  gives 
transitive  but  not  consistent  decision  rules  [1(3  . 

In  the  Bayesian  approach  [8]  of  fitting  models  to  data, 

various  possible  models  are  postulated  as  mutually  exclusive 

hypotheses  C^,  1  *  i  ^  r.  The  hypothesis  that  maximizes  the 

posterior  probability  density  P(C.|y(s),  s  fc  ft  )  is  chosen  as 

1  ~  b 

the  correct  model  with  minimum  probability  of  error.  This 
approach  involves  obtaining  an  expression  for  the  likelihood 
of  the  observations  and  integrating  it  over  the  parameters 
using  an  appropriate  prior  probability  density  function. 


In  this  paper  we  propose  a  Bayesian  method  for  finding 
a  neighborhood  model  for  a  random  field.  We  take  a  transform 
domain  approach  using  the  spectral  representation  for  the 
random  field.  Specifically,  using  the  asymptotic  Gaussian 
properties  of  the  finite  Fourier  transforms  (Z(A),  A  k  } 
where  is  a  set  of  discrete  frequencies  A  =  (A  with 

components  A^  =  2tk^/N^,  1  s.  k  £  N^,  i  =  1,2,  an  explicit 
expression  is  given  for  p(Z(A),  A  £  | 0  , ) ,  the  dependence 

on  the  parameters  appearing  through  the  spectral  density 
function  of  the  field. 

We  integrate  this  probability  density  function  w.r.t. 
arbitrary  but  regular  prior  probability  density  functions 
P  ( ~k ^  *"k ^  to  obtain  p  ( Z  ( A )  ,  A  £fipjck).  Using  this  expression 
and  the  prior  probabilities  P  (C^)  ,  1  k  s*  r  of  the  hypothesis, 
a  decision  rule  for  choosing  a  model  with  minimum  probability 
of  error  is  designed. 

The  usual  criticisms  against  the  use  of  prior  densities 
are  answered  by  showing  that  the  contributions  of  these  terms 
are  of  order  0(1)  and  hence  asymptotically  insignificant.  A 
decision  rule  suppressing  the  terms  involving  the  prior  den¬ 
sities  is  also  given.  Though  this  rule  does  not  have  the 
minimum  error  rate  property,  it  is  asymptotically  consistent. 

The  theory  developed  here  has  also  been  considered  in  an 
earlier  report  [5]  motivating  the  use  of  statistical  inference 


theory  for  image  modeling.  The  transform  domain  approach 
used  in  this  paper  can  be  easily  extended  to  include  moving 
average  and  autoregressive  moving  average  models  in  a  unified 
representation,  viz.,  the  spectral  density  function.  Various 
special  cases  of  interest,  including  random  fields  specified 
by  one-sided  models,  and  random  processes  represented  by  bi¬ 
lateral  and  unilateral  models,  are  considered. 

The  organization  of  the  paper  is  as  follows:  In  Section 
2,  an  explicit  expression  is  derived  for  the  probability  den¬ 
sity  of  the  transforms  of  the  observations  given  the  neighbor¬ 
hood  model  obeyed  by  the  observations.  In  Section  3,  the 
problem  of  finding  the  appropriate  neighborhood  is  posed  as 
a  class  selection  problem  and  decision  rules  are  designed 
for  choosing  this  neighborhood.  Section  4  discusses  the 
properties  of  the  decision  rules.  Some  special  cases  of  the 
theory  developed  here  are  considered  in  Section  5.  A  brief 
discussion  is  given  in  Section  6,  and  some  applications  are 
considered  in  Section  7. 


2.  Preliminaries 


A.  Definitions  and  Notation  :  We  are  given  a  set  of  obser¬ 
vations  fy(s),  s  =  (s^,S2)  £  ilg},  where  is  a  square  grid  of 
side  and  1  ^  s^  N^,  i  =  1,2.  The  random  field  is  homo¬ 

geneous  and  Gaussian  but  not  necessarily  isotropic.  A  sto¬ 
chastic  field  is  said  to  be  homogeneous  if  the  following 
condition  is  satisfied: 

R  ( s  ,  t)  =  E  (y  (s)  -  E  (y  (s)  )  )  (y  (t)  -  E  ( y  ( t )  )  ) 

=  R1(s  -  t) 

--i.e.,  the  covariance  function  is  translation  invariant. 

In  addition,  if  the  covariance  function  R(s,t)  is  also  invari¬ 
ant  to  rotation,  the  random  field  is  called  isotropic.  In 
general,  images  are  not  isotropic  and  hence  the  random  field 
models  of  interest  to  us  are  only  homogeneous  and  not  neces¬ 
sarily  isotropic. 

Consider  the  stochastic  field  (y(s),  s  £  ftg)  satisfying 
m 

E  (<f> ,  p)  :  y(s)  +  E  <j).y  (s  +  q.)  =  /pn  (s)  (2.1) 

k=l  ~k 

Vs  t  fig,  qk  t  Q,  Vk  =  1,  2,...m 

Q  =  <qk  =  (<3kl,qk2)'  k  =  ?  (0,0), 

qk^  are  integers) 

Here  (u(s),  s  t  S)g)  is  a  Gaussian  I.I.D.  sequence  with  zero 
mean  and  unit  variance.  In  what  follows,  for  simplicity,  s 
and  qk  will  be  denoted  by  s  and  qk,  dropping  the  vector  notation. 


Eq.  (2.1)  is  characterized  by  an  unknown  (m+1)  dimensional 
T  T 

vector  0  =  (<}>  ,p)  such  that  ¥■  0,  j  =  1,2,... m  and  p  >  0. 

Eq.  (2.1)  represents  the  dependence  of  an  observation  at 
location  (s^,s2)  on  its  neighbors  in  the  direction  specified 
by  Q.  When  q^  and  q^2  take  only  nonpositive  values  we  obtain 
models  where  the  observation  at  location  (s^,s2)  is  a  linear 
combination  of  the  observations  in  a  one-sided  neighborhood. 

When  s  and  q  are  scalars,  (2.1)  represents  a  one-dimensional 
autoregressive  model.  We  assume  that  the  coefficient  vector 
0  in  (2.1)  is  restricted  to  ensure  homegeneity  of  the  stochastic 
f ield . 

The  two-dimensional  z  transform  of  (2.1)  where  z  =  (z^,z2) 
is  a  complex  vector  given  by 

***  ^1  2  ~ 

H(z1,z2,4)  =  [1  +  E  <t>kzl  k±z2  KZ] 

k—  1 

and  the  spectral  density  function  evaluated  at  frequency 
A  =  (A^jA^)  of  the  field  is  given  by 

jA  jA 

Sy(A,0,P)  =  pNH(e  \e  ,0)||  (2.3) 

j  =  S=l.  va  t 

To  make  our  notations  clear,  we  consider  a  few  examples. 

In  what  follows,  we  drop  the  vector  notation  ~  for  A. 

Example  1:  East,  West,  North,  and  South  mode 1 : 

Q  =  { (1,0) ,  (0,-1) ,  (-1,0) ,  (0,1)  } 

The  equation  for  y(*)  is 


y(s1,s2)  +  ()>1y  (s1+l,  s2)  +  <{>2y  ( sx ,  s2~l) 

+  4>3y  (Sj^-l ,  s2)  +  ‘J’^y  ( s ^ ,  s 2+l)  =  /pu  ( s  j  s2 ) 
the  transfer  function  being  given  by 

H  ( Z1 '  z2 '  $)  =  l1  +  ^izi  +  (J,2Z2 1  +  ({)3zl1+^4z2j 


Example  2:  One-sided  models  [11-13] 

Q  =  {(0,-1),  (-1,-1),  (-1,0)},  the  corresponding 


equation  being 


y  ( s]_ »  s2)  +  <J>iy(si'S2-1)  +  <f)2Y  ^sl-1'  s2-1) 

+  03y(s1-l,s2)  =  /p  u(s1#s2) 

The  transfer  function  for  this  system  is  given  by 
H  ( ,  z2 , 4>)  =  [1  +  <b]_z21  +  ^2Z1^Z2^  +  ^ 


Example  3;  Time  series  models:  When  s  and  q  are  scalars  and 
the  set  Q  consists  of  positive  and  negative  integers  we 
obtain  bilateral  models  [1].  When  the  set  Q  consists  of 


negative  integers,  we  obtain  the  familiar  autoregressive 


models . 


V  , 


B .  Expression  for  p(Z(X),  X£f2x) 

In  this  section  we  are  interested  in  deriving  an  explicit 
expression  for  the  probability  density  of  the  transforms 
(Z(X),  X  t  of  the  observations  (y(s),  s  6.  ftg) ,  given 

that  the  observations  obey  the  model  in  (2.1). 

To  this  end,  we  first  obtain  an  expression  for 
p  ( Z  ( X )  ,  X  6  and  then  integrate  over  ( 4> ,  p )  by  using  an 

arbitrary  but  otherwise  regular  prior  probability  density 
p(<J>,p).  An  expression  for  p(Z(X),  X  6  £2  ^(tj) ,  p )  is  obtained 
by  using  the  stochastic  properties  of  finite  Fourier  trans¬ 
forms  [ 14-16 ] : 

Theorem  1 :  Consider  the  finite  Fourier  transforms  of  the 
observations  (y(s),  s  6  fig)  defined  over  a  square  grid  N.xN^. 
Denote  the  finite  Fourier  transform  of  (y(s) ,  s  fc  fig)  by 


T 

Z  (A)  =  Z  e"jx  sy(s)  ,  X  6  ^ 

s  £  SI 

b  j  =  /=T  (2.4) 

where  is  the  set  of  discrete  frequencies  X  =  (X^jX^)  with 
components  X^  =  2rk^/N^,  1  s»  k^  N^,  i  =  1,2.  Then  for  an 
infinite  observation  field,  the  finite  Fourier  transforms  are 
complex  normally  distributed  with  zero  mean  and  independently 
at  different  frequences  with  variances 


e (z (X)  z*  (X) )  =  s  u,$,p)  vx  6  nx 


(2.5) 


where 


S  U,$,  p)  =  p  |  I  H  (e 


jAl  jA2 


in  which 


|  |  ct  |  |  denotes  the  modulus  of  the  complex  variable  u. 
This  theorem  involves  two  approximations;  one  involves 
the  asymptotic  uncorrelatedness  of  the  random  variables 


(Z(A),  A  fc  for  different  A '  s,  and  the  other  involves 

equation  (2.6).  The  smoother  S^(A,4>,p)  is  in  the  vicinity 
of  the  discrete  frequency  pairs  A  6  ,  the  better  is  the 

approximation  for  moderate  values  of  N^.  If  the  spectral 
density  function  Sy(A,<j>,p)  were  constant,  equality  would 
hold  in  (2.6)  for  all  values  of  N^.  Otherwise  the  distribu¬ 
tion  theory  is  exact  only  when  the  number  of  observations  is 


infinite. 


Using  Theorem  1  an  expression  can  immediately  be  written 


for  the  joint  density  of  the  finite  Fourier  transforms: 
In  p  ( Z  ( A )  ,  A  fc  n^|<f>,p)  =  -  ^  fn2ir 

-4  Z  In  S  ( A  ,  4>,  p) 

*  a  tft  x  y 

-i  E  I  |  Z  (  A )  j  I  2/S  (A,<f>,p) 

a^a  y  ~ 

Substituting  (2.6)  in  (2.1)  we  have 

In  p  (Z  ( A )  ,  A  t  p)  =  -  ^  ln2vp 

.  3  A  ■,  j  A  _  _ 

-  j  Z  £n|  |  H  (e  \e  2 ,  <p )  |  |2 

Xkn\ 

-  h  2  I  I  Z  (A)  I  |2/|  !H(e3\eJ*2,4»)  |  |2 

AfcfiA 


(2.7) 


(2.8) 


It  is  interesting  to  compare  the  expression  obtained 
here  for  the  probability  density  of  the  transforms  of  obser¬ 
vations  and  the  expressions  obtained  in  Whittle  [1],  Whittle 
starts  with  an  exact  expression  for  the  likelihood  of  the 
noisy  variates  (u(s),  s  t  fig).  Since  for  a  general  neigh¬ 
borhood  model  the  Jacobian  of  the  transformation  from  the 
noisy  variates  u(.)  to  the  observations  y(.)  is  difficult  to 
evaluate,  an  approximate  expression  is  used  for  the  determinant. 
However,  the  expression  obtained  here  is  not  an  approximation 
to  the  likelihood  function  of  the  observations  as  in  [1]. 

The  desnity  function  considered  here  is  the  joint  density  of  the 
finite  Fourier  transforms  which  is  a  one  to  one  transformation 


with  Jacobian  unity  (though  a  general  proof  can  be  given  to 
establish  this,  a  simple  derivation  is  given  in  Appendix  II 
for  a  4x4  field).  Consequently,  the  estimates  of  <f> ,  p  obtained 
by  maximizing  p(Z(X),  X  t  fi^|$,p)  are  only  approximately  the 
maximum  likelihood  estimates.  Before  proceeding  further  we 
need  the  following  assumption. 


Assumption  1:  The  first  and  second  derivations  of 


E  In  |  \u(eJ  1,ej  2,<f>)  |  |  2 

Xfcfi, 


w.r.  t.  <p 


exist  for  all 


and  E 
X£fiA 

€  Rm. 


Z  (X)  |  |  VI  I  H  (e 


To  obtain  p(Z(X),  X€fi^)  we  integrate  p(Z(X),  X  ^  |  <J>  r  p ) 
over  (<p,p)  using  an  appropriate  prior  probability  density 


function.  We  do  not  make  any  specific  assumption  about  the 

structure  of  the  prior  probability  density  functions.  They 

T  T 

need  be  regular  but  otherwise  arbitrary.  Letting  0  =  (<t»  ,p) 

and  performing  the  integration 


p  ( Z  ( A )  ,  Afefl^)  =  // p(Z(A),  A«2x|<Kp) 

p  (4>,  p) d^dp  (2.9) 

we  arrive  at 

Theorem  2 :  As  the  rectangle  of  observations  becomes 
large  in  all  dimensions,  the  probability  density 
p(Z(A),Afcfi  )  is  given  by 

In  p(Z  (A)  ,  A£ftA)  =  -  |£np  - 

1  Z  £n||H(e  \e  2,4>)||2  +  £np(0) 

A€^x 

+  (m+l)£n2TT  -  (m+l)£n  N  -  a(N) 

-  i^n  [detV2  GO)  ]  +  0(1/N)  (2.10) 

2  Vj  ~  0=0 

where 


0T=  (4>T,P  )  =  max  G(4>,p) 

4>tRm,  p>o 

j  A.  jA  2 

G(4>,p)  =  -[£np  +(1/N)  Z  £n||H(e  ,e  ,4>)|| 

A  fcfi. 

9  jA.  jA 

+  (1/N  )  Z  |  |  Z  (  A )  |  |  /  |  |  H  (e  \e  ,<|>)  I  I  1 

Atfi, 

and 


(2.11) 


(2.12) 


a  (N)  =  0.5(N  +  N£n  2ir) 


(2.13) 


and  0(1/N)  denotes  a  deterministic  constant  term  behaving 
like  k^/N  for  large  N  where  k^  is  independent  of  N. 

Theorem  2  gives  an  explicit  expression  for  the  probability 
density  of  the  transforms  of  observations  from  a  stochastic 
field  characterized  by  the  spectral  density  function 


3. 


Decision  rules  for  the  choice  of  neighborhoods 


For  an  image  it  is  reasonable  to  assume  that  an  obser¬ 
vation  at  (i,j)  will  not  significantly  depend  on  distant 
neighbors.  Hence  we  restrict  our  allowable  neighborhood  set 
to  a  maximum  of  eight  neighbors,  East,  West,  North,  and  South 
and  the  four  diagonal  neighbors.  Thus  our  problem  is  to 

3 

find  an  appropriate  set  of  neighbors  among  the  possible  2 
neighborhood  sets  for  the  given  image. 

A.  Definition  of  classes;  We  formulate  the  problem  of 
choice  of  neighborhood  as  a  class  selection  problem  and  derive 
optimal  decision  rules.  A  class  is  defined  as  a  set  of  models 
having  the  same  neighborhood  set  Q  but  differing  in  the  para¬ 
meters  $  or  p.  The  class  consists  of  models  of  the  form 


C.  : 

l 


y  ( s )  +  £  <f> .  .y  (s  +  q, 

j=l  13  k 


)  =/pTu(s) 


q,£Q.  , 


qk  ?  (0,0)  (3.1) 


such  that  ^  0,  p^>0,  j  =  1,2,  ...nr,  i  =  1,2, ...r,  where 

r  denotes  the  number  of  classes.  Thus  a  class  consists  of 
an  infinite  number  of  models  with  the  same  neighborhood.  The 
given  set  of  observations  (y(s) ,  s  £  fig)  is  said  to  obey 


class  if  (y(s),  s  t  Dg)  obeys  only 


one  model  in  C . . 

l 


Two  classes  C.  and  C.  are  said  to  be  mutually  exclusive  if 
13 

the  neighborhoods  they  represent  differ  in  at  least  one 
neighbor . 


Given  r  mutually  exclusive  classes  (equivalently,  distinct 
neighborhoods)  C^,  i  =  1,2,...  r,  and  a  set  of  observations 
(y(s),  s  fc  ilg)  ,  our  aim  is  to  find  the  most  appropriate  class 
for  (y  ( s)  ,  s  fc  f2s )  .  The  criterion  for  appropriateness  is 
usually  defined  as  a  suitable  criterion  function  and  a  decision 
rule  for  assigning  (y(s),  s  €  Qg)  to  one  of  the  r  classes  is 
designed  to  minimize  the  criterion  function.  The  criterion 
function  can  be  chosen  to  reflect  the  particular  needs  of  the 
problem,  such  as  minimizing  the  average  probability  of  error, 


etc . 


B.  Expression  for  P(C.|Z(X),  X  t  SL  ) 

X  A 

We  first  compute  an  expression  for  P(Cb|Z(X),  X  t  , 
the  posterior  probability  density  of  the  transforms  of  the 
given  data  having  been  generated  by  some  model  in  CL,  for 
every  i,  i  =  l,2,...r.  Subsequently,  we  derive  optimal 
decision  rules  to  minimize  the  probability  of  error  and 
discuss  simplifications  of  the  decision  rules. 

Let  0^  =  (<f>^  ,  p ^ )  , (4> i:L ,  <Jji2 , . . .  4>im  and  p  ( 0  i  1  Ci )  be  the 
prior  probability  density  function  of  the  parameters  under 
class  C..  An  expression  for  P(C.|Z(X),  X€ft.)  is  given  by 

Theorem  3 :  Let  the  observations  obey  the  class  C..  Then 
the  posterior  probability  density  of  the  transforms  of  the 
observations  is  given  by 

£n?(Ci|Z(X),  X£ftx)  =  -  |^nPi 

-  ((m.+l)/2Un  N  -  \  l  £n||H  (ej  1  ,e  2,$.)||2 
1  ^  X€fiA  1  -1 

+  In  p(?.  ,p.  |Ci)  -  if.n[det  V2  Q  G(0)]  -a(N) 

1  j  0  =  0 


+  ((mk  +1)  /  2)  In  2tt  +  In  P(Ci)  -  In  p(Z(X),  +  0(1/N) 

(3.2) 


where 

a  (N)  =  0. 5  (N+N£  n2?r) 

0\T  =  (<t>T,p.)  =  max  { G.  (<J>.  ,  p.  )}  (3.3) 

-i  i  mi  i  ~i  l 

Pi>0 


~  1 


G  •  ( <J> .  ,  p  •  )  =  -Knp.  +  ( 1/N)  T.  *n||H.(e  ,e 
1  -1  1  1  XtftA 

(1/Np)  E  |  | Z  (X)  |  | 2/|  |Hi  (e3  1 ,e  2,$i)||2], 

Xtfix 

r 

p (Z (X)  ,  XtfiA)  =  E  p(Z(X),  Xtfix|Ck)P(Ck) 


P(C.),  i  =  l,2,...r  are  the  prior  probabilities  of  the  classes 
Proof:  This  follows  from  Theorem  2  and  Bayes'  rule, 


p(ci|z(X),  xenx) 


P(z(X),  xenx |ci)p(ci) 
Ep (Z (X) ,  Xfcflx|Ck)P(Ck) 


(3.6) 


* 


C.  Decision  rules:  Consider  a  0-1  loss  function  L  which 
assigns  unit  cost  to  a  wrong  assignment  of  classes  and  zero 
cost  otherwise,  i.e. 


L  [C .  ,  d  (y  (s)  ,  stil  )  =  C.)]  =  0 

1  a  J 

C  .  =  C  . 
i  3 

(3.7) 

=  1 

ci  *  cj 

Since  the  finite  Fourier  transformation  is  one  to  one, 
the  cost  of  wrong  assignment  of  the  observation  set 
(y  ( s)  ,  sfc  fig)  is  the  same  as  the  cost  of  wrong  assignment 

of  the  set  (Z(A),  Afcft^).  Our  intention  is  to  choose  the  decision 
rule  to  minimize  the  risk  function,  i.e.,  minimize  J  (d) ,  where 
r 

J  (d)  =  E  P(Ci)/L[Ci,  d  (Z  ( A)  ,  Atfix)]  X 
i  =  l 


p  (  Z  (  A  )  ,  Atfix|ci)d|z(A)  ,  ACfi-J  (3.8) 

Substituting  the  loss  function  in  (3.7)  we  have 

J (d  ( Z  (A)  ,  Afcfl,  )  =  C  .) 

K  3 

r 

=  |  I  P  (C  .  |  Z  ( A )  ,  A£ftx)  p  ( Z  ( A )  ,  A  CQ  A  )  d  |  Z  (  A  )  ,  AfcflJ 
i  =  l  1 
i^j 

(3.9) 


The  optimal  decision  rule  is 
d*  (Z  (A)  ,  A£fiA)  =  CR  = 

argument  sup  { P  (C  .  |  Z  (  A )  ,  A  CS2A )  (3.10) 

i 

Substituting  for  the  posterior  probability  function 
from  Theorem  3  and  omitting  common  terms  we  have  the  following 


decision  rule: 


Choose  class  C^*  if 

i*  =  arg{  {h  .  (Z  { A )  ,  A£ft,}} 

11  A 

where 

h.  (Z  (X)  ,  Afcft.  )  =  ^  •f.n'p.  +  m.  in  N 

1  A  Z  1  1 

,  jA  jX  _ 

+  i  £  £n  |  |  H .  (e  i,e  ,  cf>  )  \\z  +  m.in  2tt 

z  xenx 

-  in  p (¥. |C . ) -  y£n[det  V2  G(0)]|  in  P(C.) 

-1  1  2  0i0i  ~  0=0  1 
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D.  Simplified  Decision  Rules:  The  decision  rule  given 
in  (3.12)  involves  arbitrary  quantities  such  as  prior  proba¬ 
bility  densities,  which  for  the  problem  considered  here  are 
not  known.  Hence  we  suggest  a  decision  rule  in  which  the 
prior  densities  are  suppressed.  The  decision  rule  no  longer 
minimizes  the  average  probability  of  error  but  can  be  shown 
to  be  asymptotically  consistent  using  a  proof  similar  to  one 
found  in  [ 17 ] . 

The  simplified  decision  rule  is: 

Choose  class  i*  if 

i*  =  arg  { {g  .  (Z  (  A  )  ,  ACft,}}  (3.13) 

11  A 

where 

-  jl, 

g^  (Z  (  \ )  ,  \£U^)  =  Nf  np  .  +  E  £n|  |  H.  (e  ,e  ^  .  )  |  |  +  m  In  N 


(3.14) 


Eq.  (3.13)  can  be  rewritten  as 

g  •  ( Z  ( A )  ,  Atq  )  =  N-Cn  p*  +  m.-fn  N  (3.15) 

1  A  1  K 

where 

*  —  1  ii  13 ^ -1  j ^ p  _  2 

In  p  =  (n  p,  +  i  E  £n  |  |  H .  (e  1 ,  ,*.)  (3.16) 

1  1  N  A6fix  1  "1 

The  form  of  the  decision  rule  in  (3.16)  is  characteristic 

of  the  Bayesian  approach  [8] [17-19]. 
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4 .  Properties  of  the  decision  rule 


Asymptotic  consistency:  One  of  the  important  properties 
of  a  decision  rule  is  the  consistency  property.  A  decision 
rule  is  said  to  be  asymptotically  consistent  if  the  probabi¬ 
lity  of  choosing  an  incorrect  model  given  the  correct  model 
goes  to  zero  as  the  number  of  observations  goes  to  infinity. 
We  do  not  give  an  explicit  proof  for  the  consistency  of 
the  decision  rule  suggested  in  the  previous  section.  A  proof 
similar  to  that  in  [17]  can  be  given  to  establish  asymptotic 
consistency  of  the  decision  rule. 

Generality :  The  theory  developed  here  is  valid  for  auto¬ 

regressive  spatial  models.  The  theory  can  be  easily  extended 
to  include  moving  average  and  autoregressive  moving  average 
models . 

Parsimony:  The  expression  for  the  decision  function 

gk(z(A),  brings  out  the  disadvantage  of  having  too 

large  a  value  for  m  .  If  we  increase  m  ,  then  p?  decreases, 

causing  a  decrease  in  £npt  .  Thus  N£ np  *  and  m,_£nN  balance 

1  1  K 

each  other.  This  illustrates  the  desirability  of  keeping  the 
unknown  parameters  to  a  minimum. 

Transitivity :  The  decisions  regarding  pairwise  comparison 

of  the  classes  are  transitive.  This  is  because  the  decision 
function  gk(z(A),  does  not  depend  on  any  parameter 

outside  class  Ch  . 


5. 


Special  cases 

In  this  section  we  consider  two  special  cases  of  the 
general  theory  developed  in  the  previous  section. 

Case  (i) :  Random  fields  represented  by  one  sided  models: 
For  these  cases  [11-13],  we  have  from  an  extension  of  a 
relation  that  is  valid  for  a  weakly  stationary  process 

1  IT  TT 

l np  =  — 2  J  J^n  s  (A,cj>,p)dA  (5.1) 

4tt  it  it  ^ 

Substituting  for  S  (A,<j>,p)  from  (2.3),  we  have 

TT  TT  .  . 

£np  =  — 2  /  /  (^nP  -  £n|  Ihje-5  ,<t>'  |  f)dA  (5.2) 

4  TT  IT  TT 

TT  TT 

=  -ij  4TT2£np - f  jfni  |H(ejA,$.)  |  )2dA  (5.3) 

4  TT  4  TT  IT  TT 

Hence  we  have 

TT  TT  .  . 

/  /  ln\  jlKe3  ,  <p)  |  1  2dA  =  0  (5.3) 

TT  TT 

Approximating  the  double  integration  by  double  summation, 

(5.3)  reduces  to 


j  A  _  j  A  ~ 

T.  £n  |  |  H  (e  i,e  ,  <p  )  |  |  =  0 

A  fcf) . 

Using  (5.4),  the  simplified  decision  rule  becomes: 
the  model  k*  if 

k*  =  argument{min  g^(z(A),  A€fi^) 
where 


(5.4) 

Choose 


gk(Z(A),  AtfiA) 
=  Nfnp^  +  m^£n  N 


Thus  the  main  difference  between  the  one-sided  and 


neighborhood  models  is  due  to  the  term 

l  ln\  |H  (ej  1,ej  2  A k)  |  |2 
K  ~k 

Case  (ii) :  One  dimensional  series  represented  by 
bilateral  and  unilateral  methods  [ 1 ] : 

For  this  case,  s  and  q  are  all  scalars  and  q  takes  posi¬ 
tive  and  negative  values.  The  theory  developed  for  random 
fields  carries  through  for  this  case  analogously.  By  replac¬ 
ing  the  summation  over  fields  by  a  single  summation  we 
obtain  the  results  derived  in  [5]. 


The  inference  of  stationary  Gaussian  random  fields  has 
been  previously  considered  by  Whittle  [1]  and  Larimore  [41 . 
A  useful  entity  to  work  with  for  statistical  inference 


purposes  is  the  likelihood  of  the  observations.  For  models 
where  the  observation  at  (i,j)  depends  only  on  one  sided 
neighborhoods,  the  Jacobian  of  the  transformation  from  noise 
variates  to  observations  is  unity  and  hence  the  likelihood 
function  can  be  easily  written.  However,  for  models  where  r he 
observation  at  (i,j)  depends  on  a  neighborhood,  the  Jacobian 
of  the  aforementioned  transformation  is  not  unity  and  is 
difficult  to  evaluate.  By  approximating  the  Jacobian  of 
the  transformation  Whittle  obtains  an  approximate  expression 
for  the  likelihood  of  observations  of  a  random  field.  Like¬ 
lihood  ratio  tests  and  significance  levels  have  been  used 
to  identify  a  neighborhood.  The  decision  rules  using  pair¬ 
wise  hypothesis  tests  are  not  consistent  and  transitive. 
Moreover,  Whittle's  procedure  becomes  complicated  when  models 
other  than  autoregressive  are  considered.  Lastly,  even  for 
autoregressive  models,  the  evaluation  of  the  Jacobian  is 
a  nontrivial  task. 

Larimore  [4]  has  reconsidered  the  problem  of  inference 
of  random  fields.  The  procedure  developed  in  [4]  uses  spectral 
representation  of  the  random  field.  Instead  of  the  likelihood 


of  the  observations,  the  probability  density  of  the  finite 
Fourier  transforms  is  used  for  inference.  As  discussed  in 
Section  2,  the  approximation  involves  uncorrelatedness  and 
variance  of  the  finite  Fourier  transforms.  However,  the 
term  corresponding  to  the  Jacobian  of  the  transformation 
is  easy  to  compute  once  the  neighborhood  is  specified.  Also, 
the  AIC  criterion  has  been  used  for  model  identification. 

It  has  been  shown  [10]  that  even  in  the  one-dimensional 
case  the  AIC  rule  is  inconsistent;  i.e.,  the  probability 
of  choosing  a  higher  order  model  than  the  true  model  is  not 
zero  even  for  the  case  of  infinitely  many  observations. 
Consequently,  the  use  of  AIC  rule  for  random  field  model 
identification  is  not  desirable. 

The  theory  developed  here  yields  asymptotically  (weakly) 
consistent  decision  rules  for  choosing  neighborhoods  of 
models.  The  simplified  decision  rule  does  not  involve  any 
arbitrary  quantities  such  as  prior  densities. 

One  of  the  problems  not  considered  in  this  paper  is 
the  computational  aspects  of  the  method.  Maximization  of  the 
probability  density  function  p(Z(X),  could  be 

difficult  for  the  following  reasons: 

1)  Starting  values  for  the  maximization  algorithm  are 
difficult  to  obtain. 

2)  The  problem  is  really  a  constrained  maximization, 
the  constraints  being  laid  by  the  homogeneity  requirements. 


3)  Often  the  function  has  multiple  maxima,  and  it  is 
very  important  to  find  the  absolute  maxima.  Additional  work 


should  be  done  in  this  area. 

The  expressions  used  for  the  probability  density  of  the 
transforms  of  observations  are  approximate.  Recently,  an 
exact  expression  for  the  likelihood  function  of  the  obser¬ 
vations  from  a  homogeneous  random  field  characterized  by  a 
parametric  model  as  in  (2.1)  has  been  developed  in  [  6  ] . 
This  formulation  lends  itself  to  a  theoretical  treatment  of 
many  problems  in  the  area  of  image  processing,  such  as 
modeling,  segmentation,  etc. 
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7. 


Applications 


The  theory  developed  here  will  find  applications  in 
image  modeling  and  texture  characterization. 

Earlier  attempts  at  image  modeling  have  used  a  one¬ 
dimensional  analysis  on  a  time  series  obtained  by  concatena¬ 
tion  of  successive  rows  [13]  [20].  This  is  clearly  inadequate 
since  intuition  alone  suggests  that  image  models  should  be 
inherently  two-dimensional.  Recently,  two-dimensional  but 
one-sided  neighborhood  models  for  images  have  been  suggested 
[21-  22].  Since  for  images  there  exists  no  preferred  direc¬ 
tion,  one-sided  models  are  not  sufficient  to  characterize 
images . 

Recently,  stochastic  partial  differential  equations  have 
been  suggested  for  images  in  [2],  No  effort  has  been  made 
to  statistically  infer  the  particular  model. 

Efforts  have  been  directed  towards  developing  two-dimen¬ 
sional  models  in  [23  -24].  An  observation  at  position  (i,j) 
is  assumed  to  statistically  depend  on  neighbors  on  every 
side  within  a  window  of  size  (2M+1) (2N+1).  Instead  of  using 
a  truly  two-dimensional  procedure  to  infer  the  window  size, 
the  constants  M  and  N,  determined  by  using  Akaike's  FPE 
criterion  for  a  one-dimensional,  unilateral  autore¬ 
gressive  model,  are  used  to  determine  the  window  size. 

In  this  paper  we  have  proposed  using  the  theory  of 


statistical  inference  to  choose  the  neighborhood  for  an 
image.  The  choice  of  neighborhood  models  under  consideration 
could  be  arbitrary  as  long  as  they  are  mutually  exclusive. 

The  models  developed  should  be  useful  in  coding  [ 6 ] , [ 13 ] , [ 24 ] , 
segmentation  [6],  [23]  and  restoration  [3],  [6]  of  images  as 
well  as  in  texture  characterization. 


APPENDIX  I 


We  prove  Theorem  2. 

Consider  equation  (2.8),  repeated  below 
£n  p  (Z  (A)  |  <j>,  p) 

=  -^£n2iTp-y  E  [-£n  |  |  H  (e  1,  e  2  ,  <p)  |  |  2 

+  (1/p)  |  |Z(A)  |  |2/|  iHCe^1^^2,^)  |  |2]  (1) 

or 

p(Z  (A)  ,  |  ,  p ) 

=  (l/2ir)N/2exp{-  ~£np  -  ^  E  £n()H(e  1,e  2 , 4> 1  l  2  - 

2  z  A02a 

(l/2p)  Z  |  |  Z  ( A)  |  |  2/  [  |  II  (eD  Al, ej  >2  ,  ^)  |  [  2  > 

=  (1/2’t)N/2  exp[|  G($,p)]  (2) 

where 

G  (<)> ,  p)  =  -[£np  +  i  Z  /n  1  |  H  (e"1  1,eJ  2 , 4> )  |  |  2 

N  Acn, 

^  j  \  j  ^ 

+  (l/NP)  E  |  |  Z  ( A)  |  |  2/ |  |  H  (e  1,e  2,<t>)'||2] 

A  i 
T  A  T 

Let  0  =  (0 , p) 

To  compute  p(Z(A),  ACfi^)  we  integrate  p(Z(A),  Afcfi.|0)  over 
0  by  using  a  prior  probability  density  p(0): 


p  (Z  (X) ,  xenA) 


=  /p(Z(X)f  A  €«x  |  6 )  p(0)d0 
Substituting  (2)  in  (4)  we  have 
p(Z(A),  xe^x) 

=  (l/2Ti)N/2/exp[|  G(0)  ]p(0)d0 

Expanding  G(0)  in  Taylor's  series  in  0  about  0 

0  =  max  G ( 0 ) 

0 

we  have 

LHS  of  (5) 

=  (l/27r)N/2/exp(|)  [G  C0T)  +  [V  G(0)]T_(0-0) 

~  ~  0=0  ~  ~ 

+  (0-e)T[v2  0  g(0)]  _(e-e)  +...]P(0)d0 

(6) 

By  using  the  definition  of  F  the  linear  term  in  the  exponent 
vanishes.  Thus  the  LHS  of  (6)  reduces  to 
=  (l/2TT)N/2exp[  (|)G(S)  ]  x 

/exp[(y)  (0-F)T(V2  G  ( 0 )]  _  (0-0)  +...]P(0)d0 

0 i 0  j  ~  0  =  0  ~  ~  ~  ~ 

=  (27)N/2exp(  (|)G(0)  Jp(¥)  (^.)(m+D/2 

x  - - -  +  0  ( 1/N)  (7) 

{det[V2  (G  (0)  )  ]  }1/2 

0i6j  ~  0=0 


(4) 


(5) 

=  0  where 


APPENDIX  II 


Lemma :  The  Jacobian  of  the  transformation  from  the  obser¬ 

vation  set  (y(s),  sfcfig)  to  the  finite  Fourier  transform 


( Z  ( A )  ,  A  fcft  x  )  is 

unity . 

Proof:  For  simplicity,  we 

consider  a  4x4  case. 

z(A1#a2)  = 

(N,  )  1  S  e' 

-j  (A1s1+X2s2)y(s^^^) 

For  a  4x4  case. 

Nj_  =  4, 

=  {i/j} 

3>i, 

j&4 

and  =  {  4  , 

j*4 

In  matrix  notation, 

Z  =  J  Y 

where  Z  and  Y  are  vectors  of  finite  Fourier  transforms  and 
observations  arranged  in  lexicographic  order.  The  matrix 
J  (16x16)  can  be  written  as 


J  =  (1/4) 


_  ■  H 
e  32  A 


e  J2  A 


-jil 

e  J  2  A  e 


.  7T 

.  2tt 

.  3u 

~32  A 

e”3  2  A 

e 

3  2 

.  3  7T 

.  TT 

.  3  TT 
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Using  Kronecker  product  notation, 

J  =  B  x  A 

where 

-j  — 

B  =  e  J  2  A 

Hence 

det  B  =  det  A 

From  a  Theorem  regarding  the  characteristic  roots  of 

Kronecker  products  [25]  the  characteristic  roots  of  AxB 

are  a^bj ,  where  a^  are  the  characteristic  roots  of  A  and 

b.  are  the  characteristic  roots  of  B.  Hence, 

3 

det  J  =  ( 1/4 )  16  IT  a  .  b  .  =  (l/4)l6(det  A)  4  (det  B)  4 

lsi,j&4  1  3 

=  1 


by  direct  evaluation  of  det  A. 
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